Model Selection

Multimodal Document Processing

# Multimodal Document Processing

Smoldocling 256M Preview Mlx Bf16 Docling Snap

This is a 256M-parameter preview version of a document understanding model, specifically designed for document structure parsing and content extraction tasks, supporting the conversion of image documents into structured data.

Transformers English

Udop Large 512 300k

UDOP is a universal document processing model that unifies vision, text, and layout, based on the T5 architecture, suitable for document AI tasks.

UDOP is a universal document processing model that unifies vision, text, and layout, based on the T5 architecture, suitable for tasks such as document image classification, parsing, and visual question answering.

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase